. DOCOBBNT E£SORS . 



BD^143 688-._ 

AOTHOR 
TITLE* ■ . ^ 
INSTITOTlbN 

HOTB • 

'EDRS PSICE'- 
DESCRIPTORS 



ft 



^ . . TH -006 444 

• Haladyna,' 'Tom * - ' . ' . ' " ' 

!!easui:ing Perfornance: Teacher- Hfide Tests.. 
Oregon. State Dept. of Education, Salem. ~' ^ 

11 ' 

58p.' . • • ■ , • 

X ... . - - 

MF-$0.83 HC-$3.5D. Plus Postage. . 
Achieveoeift Tests; Atti'tude ^ests; Essay Tests; ; ^ 
♦ Evaluatioii' Criteria; Guidelines; *Gui^es; Hultiple . 
Choice Tests; Rating Scales;- Student Evaluation;'^ 
♦Teacher Developed Hater ials; *Tes€ Construction; 
Test Interpretation; ^es^' Items; Test Reliability; 
V . *Tesrts; Test Validity* . • ^; • . ^ • 

' - >: • * . . ' ' " .... 

ABSTRACT , \ , ' 

Among the new testing developments' axe the use^ of 
i>bjectives or goals in instruction, competency^based ^pproachels to 
instruction, criteripn referenced testing, and performance oriented ^ 
testing. These n^w approaches often -emphasize individualized 
learning; eacl^ student's progress is in dividuaj-ly. monitored by 
comparison "with clear statements of what stndCT^ts are'expected to- 
learn, rie^ careful, monitoring of individual student 'progress riequ'ired 
in such a performance based system ^has crfeat^^d the need for a testing 
technology which differs in -many r>2£pects f rbm^typical practice. 
These guidelines are intended to Jielp teachers develap testing skills 
which meet the demands of a competency based approach to instruction. 
This guide includes information on 'fundamental concepts in testing, 
construction of classroom tests and measurement of attitudes* An 
annotated bibliography pf recent ^nd significant contributions to' 
testing techology, a brief glossary of testing^terms, and a'^list of 
recommended sources for achievement test items^ are appended. 
(Author/HV) > . . . / - . 



Documents acquired- by ERIC include^ many informal- unpublrshe^d * 

* materials not available from, other sources. ,EEIC makes every effort * 
^ to obtain the best copy available. Nevertheless, items of parginal , * 

* reproducibility are often encountered and this affects the quality * 

* of the microfiche and hardcopy reproductiofis ERIC "'makes available f 

* via the ERIC Document Reproduction Serviqe (EDRS) . EDRS is not * * 

* responsible for the quality pf the original' dccuaept. Reproa;ttctions * 

* supplied by EDRS are the best that can be made from' the original. 



I- 
1/ 



MEASURING' PERFORMANCE; 
• TEACHER-MADE TESTS " 



* ' ' Verne A. Duncan * 

;h % . ^ ' . \ State Superintendent of 

/ Public Irjstri/ction 

' • . <^ 

^ ' - • * . . Oregon Department of Education 

, ' . 942' Lancaster Drive NE . 

' . ; Salem, "Or*egon 97310 - 



- 097'Z 



3- 



ERLC . . ' , , -:J, . 4.^^ , ^ 



:6- 



STATEMENT OF^ASSURANCE ^ : ' 

Oregon Department of Educatloji * ' . 

It is the'policy of. the Or.egon Department .of*. I * i 

Education that'^^ no person bfe subjected to ^ * . ^ - 

discrimination on the basts of race, national origin. . ^ ' - 

reIigion,sex,age„handicap. or marital status in atiy \^ 

^ program, service; or activity for which the.Oregon^ , 
Deportment of Education is responsible. The 

• Department >^ill comply- with the requirements of 
state and federal law concerning nondiscrifnination \ 
-aijd will strive by its actions \q enhance the dignity 
and vyortK of'aU persons.. - Y * ' " 



/ 



»*5231 18771 0500 



4 



. . \ FOREWORD 

. \. • . ■ *• .. - ■ ■ ^- ' • ■ 

in^ June 1976, the State Board of Education adopted revised minimum 
'standards for Oregon public schpbls. A resjionse to. citizen 
concerns regarding i»hat i"s, In Tact, expected o.f schools^ the 
standards call ' for ^a^ system of godl-basgd planning, which- includes 
testing and 'assessment- procedures* ' - ' . ' 

Jhe Department Education is xcmmitted- to'' helping districts 
implement the standa.i:ds* Current and antlcipateicJ problems are' 
bein^ .identified,' priorities left, .and resources allocated* 

One priori t^>4rea centers on the asse^smeii.t requirements found 
ijfi tKfe* standardi-#^ Weasui^ing^ Performance: / Teacher-Made Tests is 
one, of a series ot^ubli cations dealing/ with; assessment. I,t 

•focuses on helping teachejjjs improve their tecnniques for building 
tests to assess growth in. studetit achievement. It is my hope* that 
this and. other publications'^ in';the Isse.ssment 'series prove useful 

.in implementing district pcactices^^Kat. will meffet the intent of the 
p.lanrii-ng an(i assessment ifequireniehts".^ For" further information*, 
contact the Department's Director of Evaluation and Assessment, 942* 
Lancaster Drive NE,>'Salem 97310, telephone 378-3074. . ' . 



f — 



. Z*^ Verne'- A. Duncan ' . /;/ ' 
Staite Superintendeh.t of ' 
PgbliC Instruction- 



-iii- • 



0 » 



'/ ■: 



'4 • 



ACKNOWLEDGEMENTS 



Extensiv^e field testing of this material has been con- 
ducted with classroom teachers' and administrators, across 
the State of^ Oregon. To those of you Who hav6 read and 
reacted to the materials in whole or in- part— thanks for 
'helping to make this* document a's easy to r^ad and under- 
stand as'iJossible. ^ . - . • , 

Special recognitibn js due Tom Haladyna, Associate 
Research Professor,, Teachijig Research Division, State 
System of Higher Education. Doct6r Haladyna was princi- 
pal author of- the vinitial ^Iraft of the document under a 
_contj|^t with the flannin^,^ Evaluation, arrd Assessment 
Prdgram, Oregon Department. of Education. 



' ' \ > c ^^Blf OF CONTENTS 

Foreword <........ ^ • 

' Acknowledgemerits .. w ^. 

' Table, of- Contents . • • • . . *. /. \f 
. Introduction 

'■■1 ' - ^ 

.1.. FUNDAMENTAL CONCEPTS IN TESTING 



4 



: Interpreting the Measurement . 

* ^' Criteria fdr Test Quality . . ... ; 

Matching Items'to lastrUctional Intent . \ . . . . 
• 'Types of School QutCTimes / ; ^ * . J.\ 

.Self-Quiz : ; : , 



,11. CLASSROOM T€ST CONSTRUCTION: --Selected Response Test Items . . 

Multiple. Choice/l terns .. / 

- Selecting Items \. ^ . * 
* n Writing Good Multiple Choice Items 

Other- Deficiencies, in Item Writing' • . . . 

True-False Tests 

Matching Items 

Item Forms . . . 4 ^. , 

^ Self-Quiz ...... I .. : 

III. ,'CLASSROOM TEST CONSTRUCTION: ^ Constructed Response Items 

•When Should We^Use the €ssay Test? . . i.^ 

Writing a; Short Answer Question r 

Test' Length and Format .....*.......-.......'./. 

Scoring, the* Response . 

Self-Quiz- \ .' . . , ' ' ^ 



IV." CLASSROOM TEST CONSTRUCTION: . Measuring Sltills . t 

/■'Observation 

. ' -fTating Scales 

. Factors Affecting the Reliability of Ratings . . . 

, Checklists .. , '. l 

- A Self-Quiz . . - . . ...^ 

V. MEASURING ATTITUD^. .. ^ : 

Approaches to Measuring Attitudes . . 
Scoring and Anal^ing Results ^. ^ . ... 
'Observation Methods v .i . 




,• ' J. ' — — ' — ■ ^ 

• •■ y -; • • " ■-' 

' / ■ V. MEASURING ATTITUPE (Continued)' * 

■"i . ' * General 05,56^311 ons 40 

Critical *Inci dents * ; . .« ^ 41 ■ 

, ,Self-Quiz . . .\ ••• • 'X'" -43 

APPENDIX- A Annotated Bibliography . . . • i .45 

.APPENDIX B Glossary, pf Terms ^^-r^. •^^^ ^ • . 49 

^ APPENDIX C Sources of Achievement Test Items \: . .. ........ 51 

/ ' " • ' '* • • i 



. / 



, -viiT- 



'V 



ERIC 



INTRODUCTION \ f' 



Reeent^developents in teaching and ^^estirvg maj' alter<raniatical ly that which 
■we have called "traditional education."' To ^ome extent, these innovations 
are responses to what appears to be a declinLin_school achievement at all- 
grade levels. Among these neW developmenta-'are: tlie use of- objectives or- 
gg^s in instruction, competency^ased approaches to instruction, criterion- 
referenced testing, and^gejjformance-oriented t|sting.- ^i* 

The new approaches often eijiphasize indiv^i dualized learningf each student's' 
progress^is individua>ly monitored by comparison with clear -statements of 
what,^udents are expected to learn. - Since i'.ndivi dualized instruction' miist 
J>«^flexible and responsive to individual ^nfeeds, progress should not be 
determined by how one studenj; compares with another. Instead, »the growth of 
each student shbuld be determined by comparison against previously" detemvined 
standards. ' ' / 

- ■.•••> . 

The careful monitoring of individual student progress required U such & 
system has created the need for a testing technology w^iich differs in many 
respects' from 'typical practice.' Thes« guidelines are- intended to help 
teachers develop testing skills- which meet th^ demands of a eomjj^tency-based" 
approach to instruction. ^ ■ i 



X 



-/■• 



/9 * 



-1- 



^ ■ 



CHAPTER -.1: FUNDAMENtAL CONCEPTS IN TESTING 



Any judpent aboyt whether or not learning has occurred is an inference basfed 
on measurement.' The niore .accurate the measurement^ the more likely the 
inference wip-be* correct. The need for infojrmatipn about how well children 
have learnt leadi'to the constructi6n of exercises (items) ^Mch fan h$ 
used to assigned numerical value to various levels of achievement. A. collec- 
tion of such exercises isusually called a test . A. test should be uniform ^ 
and standard in several ways; for example, when; a test is a'dministered to a / 
group of students, all should receive the. same items under the same cflndi-/ 
tlons for about the' same amount' of time. • / 

The sum of correct responses to the items en a test constitutes the score . 
Scores may be compared to one ►another or to some desirable standard, often 
called a "passing standard." . .'• - . . /' 

Interpreting the Measurement " ; . , " 

Any measure (inclliding test scores) must have a comparative basis for i-nter-J- 
pretatidn. The basis for test scores can .be either standard-rfeferenced on^ 
grouped-referenced . If there are' 24 .principles of Science, to leam^in 
unit, with- two test items -for. each principle, a test score of 32 on a 48-il;^' 
test could, be reported either way. The standard-refe/enced score Is' about' 67... 
percent; it might be" inferred that the stude'nt understands about twoTthirds - 
or 67 percent of i the principles of science ih that unit, as judged by/ test " 
results. If the acceptable standard is 75 percent of items of a given 
difficulty,*, thfen 67 percent is not an acceptable level of achievement. The 
standard pould be set at lOp percent if it fs -desirable for students to 
respond correctly to all^items pf this. difficulty. 

Using the-^group-referenced method, a score of- 32 may 5e the highest in the* 
class. On a larger Scale, it may be higher than the scojes of .60 percent of* 
a cross-sictidn of,K?ther ^ixth grade istudents. This approach is. useful in 
comparing students to a^ previously selected reference group, reassigning high 
or low scoring students to different groups, or^helping a student select'a 
course of.study'or investigate possible future oc.cupatiopv;'. 

. • •* ' .• 
Criteria for -Test Quali ty " .. ' - . 

There" ar^- three major criteria, by .which the quality* of any 'measure may be 
.fudged: accuracy , precision , and efficiency . ^ • 

•In educational measurement, the accuracy with which a certain dfaracterfstic \ 
is gepresented by scores on a given test instrument. is called "validity." 
^Validity relates to the' content of the, test. It represents' the degree to 
which a test actually measures. what it purports to measure-usually 4 change ' 
in behavior w)jich is inferred to have resulted- from .'a particular set of 
teaching/learning activities. "; I 



feclsiQtt is concerned with the amount 'of error, associated with any measure-^ . 
ent. Measurement errors usually, occur randomTy, but are. dis^tributed nor?' 
really. Therefore, any particular measurement 'contains some error— large or. 
smalt, positive or negative. The frustrating fact is that the exact size of . . 
the .error and the. direction in^which it' occurs, positive or' negative, 'are 
unknown for any particular measurement. 

The degree of precision, or consistency, of meas^urement using a test jnstru- , 
msnt fs referred. to as test re liability '^.. Any individual test score is cpm- 
posed of what might be 'called a "true" score, value plus" op minus a, margin of 
error. -The more 'reliable a test* is;- the smaU'ep'the amount of measurement n 
error that is associated with, each indivi'dual score. The reliability of. a 
test is especially, important if scbres' are.-goi-ng to be used in^ a pass/no pass 
situation, (e.g., a situation in which i score. of 63 or. higher. is a passing 
score— 62 or' less,' is 'not). ' •*' ^ ' .* , 



In- some test situation?, the strqpS of the .measurement itself is a source oT^^ 
variation. -This would 'be analogous 'to a^ situation in* whfch- attempts to 
measure the length of a table somehqw, actually caused changes in its length] 
If -test scores' are to be useful, they must be reliable^ across a variety of 
circum<tancv.s and.from>ne instance to .another."" Reliability, or the, consis- 
tency- with which a trait can be represented by a test score, is e.sserttial to 
■ good educational, roeasur^ent. ' . ' » . ... 

<. ■ * • ■ . • • ' ' " 

Efficiency involves, the time consumed, and money Spent Ji/planning, constriic- 
ting, administering^, scoring, and reporting tfest rpstilts. Efficiency is 
considered* in terms of >the teacher and the student. ' , - ' 

Testing pra^ctices which 'l6ad to highly accurate and! precise .measures are 
generally inefficient.. Testrmake,rs usually, try to. achieve a balance with 
accuracy and precision on; the one hand, artd -efficiency on the other. . ,. 

Matching Items to* Instructional Intent 

' ■ » ■ 

•The purpose of most "tea'cher-made classroom -tests is tc measure changes in- 
achifeveiiient'' leveT 'that aV^ ^presumed result of .teaching. Thes? ty^\>es of 
tests 'are not. usually employed to measure other- related traits, such as apti- 
tude for learning, motivation,, and "attitude toward, school. • (However, such 
A. St.- .:_..^..4...n4> '^TkA Mascii^Aman'i- nf »ti-itiitioQ 1 c HI ccii*:<;@d further in 



cated and best known device for connecting items with, teaching intent. While 
there are. several sets 'of )rules for what constitutes a good instructional 
^oa1,^any statement wfjicfj communicates the instructional intent of the^ 
teacher can be used to li.ik that intent with the t^fest item. More spec.fi- 
cally, an objective or goal' can; be any statement J that describes what the 
student will be required or expected to do under ceijtain condition's (learning 
outceines). The following paaes provide information about types of learning 
.outcomes and describes techniques for writing outcome statements whi,ch 
satisfy measurement requirements. Four ty^jes of fautcomes are preseHted in 
Table 2. • ' / ' / . . - 



4» 



/ 

m • 

• ; ' TaWe 2 • : . 

'Types of Qutsomes and Related Measurement Techniques 



/ 



domain/ • ' " 

OUTCOMES 

,.. .'..\„ 


1 . , . / . -Cognitive 


Affective 


Psychomgtor 


\ Kr owl edge 


. Skills / ■ 


Attitudes , 


Skills * 


T^SPJCTS .. 


Reca,n 

1 C(|mffr€hension or g 
^^H^gW Level Behaviors 


Performance 
Products ' [^'-^'^ 


ji ■„ ; 


Performance. 

* > 


NAtCIRE OfmiT - 


Ir(f erred in a t)^^per-and- . - 
[pencil instrument 


Inferred;or 
' . Directly Observed- 


Inferred or 
Directly Observed' 


; Directly Observed • 


TECHNIQUES 

/ 


Selected Response • 

K pi triple choice 

2. ^Itrue- false ' 

3. mattihing 

•^Constructed Response 

1.. completion ' : ' 

2. >5hort answer essay 

3. extended answer essay 


Observation 

.Rating Scales ' 

1. numeC:ical scale's* 
2., graphic sea 1 es 
3. despr4.t)tive scales 

-'V^^'* • / ' ' ' * 
Ctecltli^tig^ 


• Observation 

Rating "Scales 

"1. numerical sca;les 
^ 2. graphic scales 
3. descriptive scales < 

Checklists. 


Observation * 

Rating Scales 

y. numerical scales/^ 
-^2. graphic scales 
3. descriptive scales 

Checklists 




.Annecdotil "Records 





^-IJethbds of measuring higher level behavio^^rsrre^ in: Bloom (1956), ' ' 

. .Wilier,, Williams, and Haladyncf (in press), or' S2maBi^ivJ1966)", see Appendix A. 



X.3 



Types, of School Outcomes ' \ \ - 

. ■ - - ^ 

It is /generally accepted that the, miss:ion of the public* s^iool is to help 
each student acquire the Knowledge, skills, an^, attitudes necessary to 
functiori effectively in a variety of life.> roles./ Knowledge and^ills are 
usually as socidted with cognitive behavior. Psychomotor skills an? related- 
to-roovemeht; such as marching, but are often fodnd combined with the others, 
as in the playing of a niusical instrument. /Attitudes are merely one aspect 
of .affective behavior which may or may ;iot be significantly related to 
cognitive behavior. There has, been increasing interest in measuring student 
attitude as Va result of the growlng^^nVidtion that -a student's at^tftude 
toward schooi^and .society mdiy be as^iJtiportant- as^knowledge or skills jacqutred. 

• . ' ^ _ '"^ ' - ' , '/^ * j" 

Knowledge . Knowledge is an attribute which can only be inferred student 
behavior; it is not directly observable; it is intangible* ..The acquisition 
of knowledge is not usually measured through responises ^to pfiper-andrpencil 
tests, oral responses, or an observed physica^l response. Some paper-and* 
pencil instruments use selected :response ^J;echniques, including, multiple 
choice , true^false, or matching . The latter two are actually a variation 
4)f the first. Another technique is to, ask students to; construct their 
answers to questions' rather than to select an appropriate response. Con- 
^tyucted response techniques include, completion , short answe'r essay , and. 
extended answer essay . .While constructed response tests have some limita- 
tions, they can be useful in measuring school achievement. Selected response 
testing is discussed further in Chapter III and >xbnstructed* response testing 
in Chapter IV. . * / . ^ > : ^ 

Skills . Measuring skil^3 typit:ally,'i wolves the observation of a performance 
orjyyjduct^- ^heHechmques most used for measuring skills are {1\ observa- 
tion, (2) rating scales (rendering dudgments^on some niime^r-ical scales), pr 
Ts Fchecklists {marking Wfether^a sequence'of behaviors or traits ^s present 
or absent) • techniques for^ measuring skilJs are presented in. Chapter V. 

Attitudes . The third type tff outcome, attitudes, is tn the affective domain: 
As shown in Table 2,- the measurement of a^titu'de involves many oiF the same 
techniques that were used to measure skills/ There is one additional tech- 
nique, however: anecdotal records .. The measurement of attitudes is dis- 
cussed in greater detail in Chapter VI. \ ^ " , 

A variety of techniques can be used to mea'su»;e most ou^c^es; in some cases," 
Ifiowever, the necessary technique is seTf=^ident. ' ^ , 



' SELF-Q,UIZ 



Match *the techniques on the right with the ^tajtements of iostructional intent 
on the left. . " * • ' 



0UTC0>1E STATEMENT 



1. 

2 

4. 



5.^ 



6- . 

8. 
9. 

10.' 



Giverr?^ict tires of* animals, the student' 
will idejitt)€y. ,the type of invertebrate or 
vertebrates. ^ - 

Given sentepces, ,thi studBDt will select 
the correct yerb tense* 

The student will, describe^ the action of 
-molecules in melting ice. ' 

^ ^ ^' f - / 

The .student will >' determine the speed of a 

falling .body when givea the number of 
^seconds the bodly. has fallen and the law 6f 

f al ] i ng5?b(>di es . % • 

ThQ^J^tudent wilf provide a rationale for 
-flu"$ho'ts for the aged. / / J „„-^' 

Jhe student 'WiTl correctly, reassemble a 
motor. ' * t . 



Stu'dents.wi 1 1 be able to compT?teiavmanual 
dexterity exercise in less tH'an two 
minute^^'. 

• Name and briefly describe,, in writing, 
f our^ aspects of . ef f ec ti ve communi cati on . 

How effective is the use of Vubrjcation in 
. improving the mechanical efficiency of a 
machine? , • ' - " ' 



a« 
b. 
c. 
d. 



TECHNIQUES 
observation 



short answer essay 
multiple choice 
ratijig^ scales / 
•checklists 
compTetiorV: 



How well has the student used alliteration 
in his poem? ^ \ ' 



OUR ANSWERS. 



1 



1. 



C, multiple choice was selected because knowledge is being measured and 
it wdutd be efficient to provide options for the studeijit for each 
pi'cturei, A completion format would also be appropriate here. 



2. ■ C, again selecting the correct answer sefems mo%t appropyiate for, this 
\ knowledge outcome* , 

•.3. B, "since description -if required, . the Short answer 'es4ay seems best. 



4; 

\ 
fx 

5. 

■'■•5* 

6. 

£ 

h 

9. 
10. 



Fi this is a higher le»'el. of knowledge requiring careful analysts and a 1^,.- 
constructed response. . - a 



B,. the rationale 'is a knowledge outcome, best presented in a short V 
answer" essay. 3^ ' - 

E," implicit here is that- a sequence of actions miist occur and must be 
performed correctly in order for the motor to b^. correctly assembled. 

' A, this. is. a simple observation; the teacher is interested in a psycho^ 
jTiotor outcome. . ' 



B, a measurement of knowledge which Js most ap^propriately expressed in ^ 
a short answer essaiy. 



D; appears to' ask for a rating of effectivene^is, a performance-oriented 
outcome. 

D, an fstimate ofp^ow well the student's product exhibits^ a desired 
learning outcome ♦ ' ^ 



^^^^ 



ERIC' 



.1 ' 



-8- 



/ 1 



CHAPTER II: CUSSR.OOM TEST CONSTRUCTION , • - • 

• . ' ■ - ' • ■ ' ' ' • J- ^ . - . 

. Selected Respbnse Test Items 

Classroom achievement -tests "should measure learning, relation to teach- 
ing objectives, using the ^ost .ciQmprfehensive sampling possible. Selected* 
response . tests (i..e., including multiple choice, " true- false, and matching^ 
offer the greatest potential for sampling because more questions can te a%ked 
in a set length of time. ' A longer- test (i.e., bne with more items')^'fs. 
usually more accurate and larecise. Thus the selected response technique As 
most often used by .measurement- experts and teachers. ' ' ; ' / - 

Potential Advantages • - f •■ ' 

\ + Easy to score. * , . ' ~- 

+ Can be. objectively scored; that is, the answer is .predetermined and 
" any grader should arrive at the samer. test. "score.,/ ' ■ \ ,. 

+ Can be used to. measui^e simple recall or higher level- knowledge. 

+ Generally requir'es less time per iter on Jhe pari of the student; 
therefore, more questions can be ansWered and^^reater sampling jf 
the material can be achieved."^ ^ . ^--i- ' ' 

Potenttal Disadvantages ' * * ^ ' 

- Correct answers may sometimes be" inferred from the item itself unleiss'' 
' questions are car;e fully worded, - * * 

- Some'items test recall^ of trivial knowledge. 

Time-eonsuming and difficult ta consrtruct accurate ^and 'precise 
items^ , ^ ^' 

- Corrjsct answers may be k achieved by guessing. . . ^ ' '. . 

These dis|idvantages are usually considered minor 4n'^^significdnce. For 
example, gue^ssing plays a small part in most tests. If there are. four 
choices for each item if» a well -constructed .lOO-item multiple choice "test^ a.-' 
student lacking knowledge would be expected to score ,^a6out 25 percerit; on the 
average, students'- would guess correctly ' once in &very. four times. Test 
scores should be interpreted wfth this in mind. .While the second and third 
disadvantages are mo^e common, -they.. can be overcomje. (See Appendix A for. 
titles of guides to writing multiple choice item's for higher ^levjel knowl- 
edge.) kfhile these tests are more difficult to construct, 'they a r^e easier to 
score. /Items that prove useful can be used year after year,i provided the' 
teaching objectives do not change. - ' * ' 




Multiple Choice Items , 



Mul ti pi .e choice items are usually written in one of two 



Mary had a little 
*a. -too much to eat . 
*b. lamb ' . 

c. goat 

d. . . brother- 

- • , . - 

^hat did Mary have? 
a. too muQh to eat 
.*b. a'little lamb 
.c-. a littler goat 
d. a. tiny brother 



i.tem stem 
foil ' '.^ 
correct answe,r' 
foil 

foil- • 

item' stem 
foil . 

cof'rect answer 
:foil 

foil! . / 



ways,: 




Note - the -i tem stem . Wr.Ohg answers are foils (dis tractors) 
'.the "correct answer andnhe foils are called options . Wei 1 -written, its«p% 

havjeJoiVs which are likely to be chosen by students who have riot yet. learned 
-the' objective which the item was designed to measure. .Thajt is, all options 

should be feasibly correct responses if the student does not kJioW the 
,matt.er.ial.' ■ .. ' * ' , • . ' . . .' 

The.-first item example has an item stem^. which. is part of a sentence arid is 
completed with»the options. The second itenff stem is a question, and the 
options' answer the question. All options sholld, be parallel in. grammatical 
construction and length. . * • ' ^ . • . \ • ^; 

~Bo.tft..o.f .the following Uems are .examples of how'^g^t to construct, a mifltiple- 
: cfibice'ques.tl^ri".'^ . . . ■ * . ' ' ^ > •* - 



Jack and Jill * ^ ' 
jai.' were two nice kids 
1).' boy and a. girl . 

c. They Were going down the 

Hill/ ' ; - 

d. >'I don't know - a* 

Where "dicl'Jthe cow jump? 
a. V .'3-1/2"; fpr a new 
* worl d' s record . ; 
b* over .the.moon 
-C^ She^umped d'ver the fence 
d;. wh^reVfer sfte wanted 



This item il lus^ates , incorrect- 
grammatical cojiSjf rucftibn. Only 
op'^ior^ "a"'- completes tlie sentence. 



Each op^tiortJihas dfi?different' gram- 
matical coristrp(iction^< ;■ 



Sel^'ttnq Items: • •. •« / 

:■ • . 7'' • • ' / - ' ~ ' 

Many items already exist^that could accurately- measure the teacher's instruc- 
tional intent/ fhese can 'be collected, or^zed,."and used to great advan- 
..tage. It saves teacher time in constrycting. items and helps build effective 
and accurate tests. Appendix C lists some sources for,. ftemsL. The teacher 
iMght even ask students to help write .items..' Provide student the item stems 
.and ask them to suggest options. .. Students | can also be" helpful critics as 
teichef^s rewrite to .improve the quality of particular test items, . • 

■ Writing. Good Multiple Chbi egg terns - 
■ Procedures: ' ~ y"":-^.^ 

\ X. .Identify the concept op, objective to be tested. 
2. Write ttre Item stem. 




3^ . Write the correct Answer. 

..4. ■ Write Ihe "foi^Vparallel in structiire to the cO^SPect answer. 

Item Vriting gets easier with practice. The following exercise illustrates 
.,€?pme pitfalls;that can be avoided. ^ ^ * . 



.EXERCISE ^ .^^s ^ . _ 

^ " ' ^ ' , ; I \' • ^ ' 

Before you ^take this test, be fiMf-ewamed that'you are expected to score IQG 
jiercent. ^lark the letter corresponding to your answer* in the space provided. 
Try .t(f ." figure out" th6 right answers usfng the Iclues ^given. . 

^1* How Often welre the seven Cities ofl Cibola discussed in early 

u " . \ Spanish literature? \ ; ' i ' ^ " - 

,> .a. -seldom , ^ ^ . . ' -/^ 

'4 ' b. always '^'^ ' , \ ' ' ' 

' \- ' ci .never % ' ' i , 

- : d. all of the ^ime * ' V 

^2; The betari)inomial distribution is a function of which of the 

^ ^following theoretical distributions? 

a... .bivariate normal . > , ' - 

'\ , b. multivariate normal ' 

c. poisson • v»* 

, , dl . beta , r 



_^3. , Which Romantic' poet is best knoWn for h'is "amorous" activities?' 
. ". a. .0. J. Simpson ' ' . ■ 
. b. Byron ' . - 

■<. Ci "the Fbnz" . . ' 

d. Henry Kissinger ' • 

-11- . : . ■ ■ 



4. The thermotropp-le adjustment is * . . . . , " ' \ 

a,. ■ si xteen * • . ' . 

. - ■ ■ . b\ 8 + 8 - . . ' _ . ' • . < 

C.\ 2 'o ■ ' ^ , ' ' ' A • " 

d..V 4x4.*. ■ ^ 

: " .5 . The\most defensible reason for using caution ..in .handling ■ Hydro- , 

chloHc acid is the " . ' .^"f ' ^ - ' • 

.a. heat v"^* " ■ ■ . ' ■ 

, . b: .sSeVl " ^ ' ^ ■ , . . 

•d! fadt'that when it come? into contactf with the skin it\will 
burr\ you. . - - . \ ^ 

OUR ANSWERS ^ ''^ ■ " ' - - ' • • • '. 



^. -s* 'fr 9 'e a -z. V. •! . , 



The above "items represent five, pitfalls to avoi4 in .the ftonstruetion of. 
■foils' for multiple dhol<:e items. ' " i 

X: Specificld eterminers . The use of absolutes' ("never/ • "always," ;to-' 
: tally," and "complet^y") suggests to students that these opt^oris 
are not correct answers. Students should 'be demonstrating their 
lear;fiWr?^ther-than their clevern^_s,? at-decipKeriflg answers to pborly 
constructed questions. ^ • - ^ ^' . . ' > i > • , 

2. Cognates . A.clu^ can give away the^Jknswer, often in the form pf an- 
' opt! on. which resembles some word or phrase in the it^. stem. 

-3. Silly or ridiculous foils . This .form of , humof- may.' provide comic 
[ ■ .relief during a test, but a^s a, technique, it als* may give the answer 

y ij away. . ." c • . - ' . 

4. Equiv alent foils . If orily one answer is' allowed, and two optioTts. are 

. eqMally correct, than neither can be, the "right answer. t^is logic^ 
is often" practiced by studients. " . * ' . 

5. -' Longest- options ai^e'" often the correct , options. Poorly written items 

often have brief foils, thus Vin(yrea sing students'- chances to make a 

lucky guess. ' ^ " , ' ^ . - 

• ■ .. * . ' ■ " ' ■ 

Other Deficiencies in Item Wr iting . . . « . 

ThI use. of technical language.and long passages ending with. a question 
creates laborious r'eading, difficulty for-veak readers. It also tries 



^ - students*, patience, creates" anxiety, and .rediSces motivation. Suctj .ques- 
^i^.^X^^ti0j^s .take up tod much testing time. To illustrate: ■ 



Fran ,is a high scheel student who 'wants, to get good gr-ades but has 
» trouble 'studying due to distraction's such as watching tel*evi$ion, 

telephone ca-lls from friends* and writing notes to her" boyfriend. / As a 
. counselor, .which of the following procedures for developing 'and /mai n- 
. 'taining 'motivation wou1d.you recommend to.-Fran? * / 



7 V 



a. Interfere. as little as possible. .buX provide learning experiences 
. .. . ^ when jnegu'6$ted.>\ .' > ■ ■■ - : . / . 

b. Frequently^ "remind Fran-^of the eventual, value of what s'he* is "to 
/. . l.earrv,' V ■ ■ , • ■ '^W' ■ / •■ ■ 

•*.*, : c. ; Tel r the parents, not to interMre--Fran is probably /ngaging in 

; • nonstudy behavior to frritate/them. ■• ; / 

3 , d. Pay Fran a 'sp'ecifiiffd amount/ for every hour of • uninterrupted study. 



/ 



The^raUltiple-maltiple choice,, ftem .,ca^ aljso be confusing arid, hence*, time- 
consuming. , ,/ ^ ':'.;,/• 



1—1 




When FVash arrived in Mongo, 
l.went straight to thfe Pat 

II. discovefi'ed that the'Cla)H*eople bad kidnapped DaU 
III.,fotin4 out /that, poctor .Zarkoff was being held in Mina' 
IV. was told fRat Prince Barron had been bickering with Ki 
again. . . . ^ . 

' ^ a. I and II . , . 

-b. ^11 and III" * " / ; - ^ 

C. I ^and IV ; . ^ - • ' ' ' • 

d. ^11, III,* and »Y 



s Palace, 
ng Vulcan 



item is far too long. In addition, the clever student carv figure out 
best answer by noting the "If" is most often repeated, and therefore, 
on "c" is probably nof correct. If the student knoWs that "I" is not 



This 
the. 
option 

correct<f then' "d" must be the correct answer. 



Some item writers use tricks to lead students to select foils, .a practice 
that is NOT recommended. The Intent -of the item is to/measure achievement, 
not cleverness*. .Items Which are direct and s4mply phrased yield the most' 
yal id information about student ach.i/vement»^ i- Tricks upset many students, 
consequently reducing th.e overall precision of the achievement measure. 



Qwnions involving-opinions, . values, .^nd attitudes should. not be included in 
an a^chievem'ent test." Mt H difficult, to Veat Opinions as f?cts and^st 
accordingly. If an opinion is *tq be tested, however, it should be qualified . 
as "in the -statement below. j . 



According to lectures* the .best sou^rc^of protein is: 

.-^ ■ • Jl. 



Finally, options should "be 'presented '}r\ random o\der. Itefti writers can slip 

y^lJL 1 A o - D -1 r. . A or enol\l wnrHc with ontions: 1. B 

2. 

True-False tests" 

True-false-auesttons, a form-of multiple. choice, are often useful. A greater 
number of true-false questions caHj^be asked in an hdur .than ar5r other type of- 
' question, malcifig possible a wider range of tontent s^nng.- These tests ara 
easier for raanv.- teacheVs to write than multiple choice,, creating better 
^ efficiency. Althoujgh achievement in higher, cognitive areas .also can be - 
•♦measured with true- false questions, it is. difficult.\ The primary di sadvan- 
• *tJige of the tr4ie:false item is its high'guess level.\ A studeTit who guesses 
. , randomly wyi get about 50 percent of the items correck I. pr^lem pan be . 

. allev^ted; howeW, if ojie employs a great many itemi and is. careful in the 
; interpretation - of test results.-; foY exampT), a scoi?>e of 54 Percent op a 
true-false test. would indicate a low level of achievement, with 85 percent a 
moderate level', ^y contrast, '54 percent, on a multidle^choice -test would ■ 
l^ndicate moderate achievement and 85 percent would be fairly high. 

, Some tips lor writing'true-fal se items are listed below: \ . .-V' 

^ 1. Deal with statements that -'are dichotomous in riature ;\ avoid items having 
• 6 , shades- of meaning or degrees. of comparison. • i ' ^ •> 

_ ■ ^ T * 

Good: The planet wii:h two moons revolving around it is Mars. ^ 
' " Bad: One of the nicest things about Slobbovia is its ^climate. 

7 ■- — — : — =- : T : r- 

The first item is. clearly true, there 1s no conjecture. The second item 
. ' ' is Subject to some (criteria for judging "niceness. It is a comparative 
judgment not Well suited to tFue-.,false questions. . , . - . 

2. Avoid the use of negatives or double negatives . It has been 5^°**" ' 
using negatives takes up more testing time and makes. the test more diffi- 
cult than, t'fie restiU4ng information is worth. , ; 



into' a pattern: 1. A 2. B. 3. C- • 4. D; or spel^l words with options: 1. B . 
>. A 3. D.. ^Student^s who have not learned,^ look desperately for clues.. 



ERIC 



' • -14- . 



.. / ^ . . • * 

Bad: You should not pour hydrochloric acid into water. ' 
Good: A dangerous reaction will occur when hyxIrocHroric acid is poured 
injff water. ^ • i v . •* . 



The careless, but knowledgeable,, may omit reading ^"not" .and answer 
incorrectly^ The second phrasing of the puestioh 'avoids the problem. 

. ■^.^ ■ . ' ' ( • ' ' . ♦ ' 

AvoiOscriy sentences . Lengthy sentences are a test of . .the student's, 
^patience, concentration,/and reading ability. Such que"itions are seldom 
•effective jneasures. </■ \. ' . - .• 



Bad: A'cdding system may be def ,*jjed as a set of contingentily related, ft 
'■ nonspecific categories which are not readily identifiable to the 
audience for which the system was intended but which are. identi- 
fiable tb.the creators of the system. \ . • ; ■ . 

Godd: One, example of . a coding system is Bartlett'$ memory schem'^ta. 



In both cases, a coding system as it is related to thinkinq needs to be 
ifefined* .While the first example, is technically accurate, it is wordy, 
convolutedv and difficult to follow. 'Trie' second-example Is accurate, 
brief, and. easy to nead. , , * \ ^ " - . . 

AH true-false items should contain singlfe - ideas . -Trying to squeeze itoo* 
many idfeaS into one question is -ineffective.*. 



Good: All squares have interior^ angles which sum to 360'degrees. 

Bad: Squares, rec-tangles, an^d trapezoids have interior angles which 
•sum to 360 degrees, and all quadriiateraTs. have fouy^. sides. 



In the fi;rst example one concept is tested*^ 'In the second, %t least 
three concepts are tested. • The-^second example could be rewritten to. 
produce tfffee legitimate ^true-false items of merit: ' . 

\. ■ • . . 

a. * Squares, rectangles, arid trapezoi<is are the three. members of .the" 

quadrilateral family. . ... 
•■*'"*. * : , '. 

b. . .All quadrilaterals have in^fior angles which sum to 360 degrees. 



c. All quadrilateraTs have four sidesi * - . 

.An excellent resource on writing tmie-false items 'is Ebel'.s Essen- 
tialS'Of E ducational Measurement , 197? (see Appendix A). ♦ 

^ / • •■ - ■■• .• 

Matching Itoms . • • • / - 

When material is subject to more tharr fpur or five options and seveVal items 
steins, matching items may be most efficient. . In the follQWing example, 
students are no identify whfch 'planets in' the ^ solar system have- certain 
characteri sties. '^H^ / * - ' '-\ - '\' < 



1. The speediest orbitin^^pianet 



a. Mefcury 



2. Has the densest atmosphere ^ ' . &• Venus 

/ - ^. * . c. Earth 



' ■ 3.- Has trace's of oxygen * <=. - : . d. Mars 

' .. ' • V e. Jupit^ . 

4. is a satellite . . f • -SatuV-n 

' ., * . g« Uranus 

' 5. l\ considered a'moon . . Pl.u^o . 
• - ' • « " •. i . Neptune'- 

^6. - Has thelmost mo9ns S^^J*^^® 

. . . - - - k. All of .these 



7. Has two "moons 



■ True-false aiid matching. items, as' variatiorj^'onfiultiple choice, are Effec- 
tive for' certain* situations. , the suggestions given earlier for multfple 
choice' .items would apply as well to these types- o^ items. ' ^ 



Item Forms . , " . . ^ " * '•' 

One way to write multiple choice (Questions is to folic-, a-^odel. Many test 
i'lems can be created, by simply -inser^inji -new words or elements into the 

model. * - • ■ ' ^ 

• * , - 'W* 

Consider for example, the math t?uea«ion: ^ • . . 

, y . •' . ^ . « . 

. ' . » ■* • . 

I «, — 

~~ • • • • . " J • 

What* is the value of "X" in the equation X.+ 4 = 10? 



This item serves as a model," or "form/' for a whule "family" oT items.- "^By - 
ireplacfng the -number +4 with any i?lteger between -9 arid +9j and repl.acing the • 
numbc^r 10 'with any number ffclm 0 to 19, nearly four hundred questions can be , 
cremated that measure the^ same •domain of kndWledge. - 

Another examplle: • . . ■ n * ' 

: ^ * 

• ~' ^"^"^ ^ — Z^'^ '^'^ ' * 7. ' 

Which of the followi'hg sentences has an INCORRECT use'of^the verb? 

^a. Tomorrow,. Mildred will be on vacat.ion. * • 

b. . Yesterday, Melvin was tired., ' ^ 

c. Today, Fern weren't with At. ' ' ♦ ' ' , 

^ . ; d.. This morning, Jean wasn't ffere. * ' * ■ . , 

^ s * ^ * . , ^ 

■ : — ■ » ■ 

' ' ■ . : c • J \ 

sentences from a neading text could be identified and .arranged into four- ' 
QPtion sets. Jhe^^erb tense in one of the options Would be changed to 
provide an. appropriate^ option. - ' . \. \ . \ 

The use cwitem forms is somewhat limited, for the, present, to topics which 
rare easily described* (such as mathematics, spelling, English usage). However, 
the jtem form is one way to quickly create a' great many selected response, 
items. - ^ ■ ' " ' •* 



, ' Self. Quiz 



<3 



\ 

« - 0 , 

Select the best answer. Write the letter corresponding to *your. choice in the 
space provided at the left. ^ ' v' : ; 

1 > , Which of the.f9llowing is'^ typically the most efficient form fortfffe 
testing of knowledge for large groups of students? 
^. a.' product assessment 
' b. performance testing 

c. selected response V « ^ • ^ ' 

d. constructed response - . ^ \ 



2. The most important reason for using ^the mujtip-le choice item ii 
because of fts \ 
. .a." potential for better sampling of content i 

b. ease of scoring" " " ^ 

c. flexibility in measuring benavi.or 

d. objectivity ' 



-17- 



ERIC 



Option is^to foi l* as 



a. -^ansvrer key .is to raw scofr — - 

b. object! V(5 test is to essay test 

c. possibility i sto-mi stake \ 

d* multiple choice is to true^false 
e.^- pbjective is tG-subjeotiVe 



Which of the following *il5 not true of an objective item'^as com- 
pared with an essay item)? , ' • . 
a. ha5 high scoring consistency 'from scorer to scorer 
b". can be scored qUickly . • 

c. can be prepared quickly ' 

d. free of factors .of skill in expression and penmanship 
,e* free from opportunities for bluffing ^ 



For 'each 6f the following questions write, either for true or "0" for 
'fa>se in the space provided at^the left. ^ « ' \ 

; ^5. ^ The s-tem of a multiple choice item shoiild state, or clearly Impl/a 

: ^specific direct .question. , • ' 

6 ^ The stem of a multiple cfjoice item should be limited to a single 
""^^^^V^\ sentence or sentence fragment. . ^ 

1 . One option \x\ each multiple choice item should be so abstrnd that it 

would.be chosen only by a student who is guessing blindly. > 

J _8. Well -constructed true-false items are no more subject/ to guessing 

than are well-constructed four-choice multiple choice items. 

^9. It is often dif;ficult and seldom advantageous to make all of the 

respons*es to a Wltiple choice item, parallel in -point of view, 
. grammatical structure,. or general appearance; ' 

>U0. Multi/le^ choice Items having negatively stated stems (with the wbrd 
not /laying a crucial role) tend to be better items. 

11. A well-written item can cause' clever .students to .find the correct 

answer by a process of elimination of incorrect dn^wers. 
• • * 

12. .Good true-false items express single, not multiple ideas. ' 



OUR ANSWERS 



+ 'l\ 
0 'IT 
0 -^T 



0 '8 

0 'L 
+ '9 



0 

V 'Z 



CHAPTER III: 'CLASSROOM TEST CONSTRUCTION , . 

* _ 

Constructed Response Jtems 



A test item which requires the studi^nt to create a^esponse rather than to 
select; one, is called a constructed -response item, ^lath compu.tation items 
are often in this category as are the^ moire complex story pxgblems frequently 
.used in more advanced math^^nd science Content areas. The jcoqnterpart from 
other contefit areas, is the -completion qiiestton. The ess?y test is another 
type, of constructed response test. Items\ are relatively- easy to write, but 
often difficult to :2ore due ta the variety of correct responses available to 
students; ^ ' 

.Jn many respects the es-say ^est is the most tnisused of all constructed, 
response typ^.- Many teachers^ lise it because it involves the student in 
written composition, not realizing the impact on measurement validity. The 
essay .question should not be used "to test knowledge of a given topic and to 
measure writing skill simultaneously. A student may know the a"nswer, but the 
ability to communicate' in writing may be lacking. • Th'f item writer must first 
decide whethgr writing -skil 1 or knowledge is to .be pleasured.' A short or 
exte^ided artswer essay' may provide useful information about student knowledge 
of. atsubject; writing skills are more effectively measured by means of rating 
scales or checklists. - , ' ^ 

When Should^ the Essay Test. Be Used? * ^"'^ ^ • 

Classroom achievement tests, when they are successful, accurately measure- 
specific learning outcomes. As shown in earlier chapters, selected response 
tests .hold the most potential for adequate sampling of content. . The con- 
structed response test is often limited to a narrower range of content. 

A constructed response test must,^ of course^^^e used if the .instructional 
outcome to be measured requires a written response. Consider the following 
example: 



The student vrill describe in writing at least four features of the 
topography of Mars. 



To test achievement of this objective, an essaj' question would ask the 
student to describe in writing four features of the topography of Mars. 



The advaritages and dlsadv.arttages of an essay test format are listed _below: 
Advantages^ . 

+ comparatively easy to prepare 

4- lends itself easily to measuring higher level cognitive knowledge 
+ may.yield additional insights (or. measures) of other learning 

Disadvarittiges ' - . 

.hard. to scfore even when fol.lowing the recommended procedures 
^•s .subject to a number of biases in scoring - . * 

- - does not offer as good a potential for adequate coverage of a. content 
' ' domairi ' • * 

\ / ■ V . , * - / 

'The extended answer essay - is sometime's vselected' to measure the degree to 
which a student can select, organize, and. synthesize material into a cohesive 
response. However, the extended answer dfssay is probably the least efftcipnt- 
in measuring school achievement because of th^ narrow ranges of content 
sampled, the confounding effect of writing a^ilily-^ and the lack of precision 
in scoring. " , . . t • . 

The short answer ' essay is probably the mosjt effective of the constructed 
response items; more questions can be asked in ^ given tiine period, as 
compared to the extended answer essay, and consequently ^mpre learning out- 
comes dan be itie^fsured. \ ^ ' 

Writing a Short Answer Question \ . 

The-followlng steps should foYlawe'd'in wi^iting a short answer essay 
question: » . 

f . . • ' • . ■ 

1. Determine in 'advance the concept or objectivev( learning outcome) to be 

tested. ' " . ■ 

2. Paraphrase the familiar material . Do not use ve^i)a*4flnnaterial . 

3. '* Usevsuch verbs,/s "compare, illUstratei give examples^-of , ", so the 

stutrdnt understands the nature of- the task. 

4. Make the questions ds unambiguous 3ind clear as possible. 

5. . Sample the content as fairly as possible. Do not overload the test with 

i terns ^ from one particular area ^unless students have been prepared in 
this respect.*^ . . , - : 



^ ' ^ ^ ' ^ • • v -^"^ 

perhaps tty? greatest difficulty in writing short answer ess^ questions^ is 
indicating the. desired extent of the answer* Test writers often leave too 
much open to interpretation. Fqr example:, i " ^ .^^ 



Bad: How did Polk become President? . \ \. ^ r 

Better: " Describe factors which contributed to PoVk becom'ing - President. 

' . - - / " ' ■ ' 

Best: List and briefly describe four major factors in Polk's life w>iiph 

. * contributed importantly to his.becomins President/ 

The best short answer essay questions are detailed and clear* less detailed 
questions lead Xp answers whiiih^ though still cred^ble^ ma(y vary gfreatly from 
the model response. A well-written short answe/ essay ./question helps avoid, 
the "creative", answer that^is marginally acceptable.^ • . 



Test Length and Format 

Test accuracy by far is the most overrtdlhg concern to the test writer; 
learning putcomfes must be fairly sampled to achieve a^rcuracy. ■ There are 
several options in essay testing in ai fixed tii^e peKiod^ With 45 minutes to 
test, ten short answer essay questions stiouldl provide enough sampling for 
accuracy. A few extended answer essay quesfi^ons would not sample enough 
content. Avoid asking students to cho.ose ten pf twelve sboVt, answer essay 
questions, since this reduces the consistency of: measurement from one^ student 
to another. (It is hard^to^ write questions of "equal" difficulty.) 

Sc^yftng the Response ' \ - *\ . 

While the c^structed response exam ma^y be relatively easy to prepare, it fs 
more difficult to score. The, i^ol lowing suggestions can help realize effec- 
tive and reliable scoring." — : 

§ Assign points to each question when wr1t1nV^tHe^~testv^*G1ve credit for 
the extent to .which, the student's answer fulfills the rfequlrements of 
the mode] answer. : - . ^^^^ 

• Wherever possible, keep scoring anonymous. . If not possible, keep 
in mind that knowing who wrote the answer can. prejudice the scoring 
procedure. ' , • * . . - 

• Prepare model answers for each question a^d use these answers in scor- 
' ing. Sharing the model answers with ' th|rstudents after the t-est help's 

ma^e the test a teaching tool as welt ar^a measure. . i 

' " ■ I ■ -21- . J • " 



Try to keep poor grammar, penmanship, or spelling from biasing your 
judgment of the knowledgb elicited by the Item* If thest^killsrare to ^ 
te rated, they should beldone so separately. ^ 



If each item has a diff^erent weighting, be certain students understand 
this on beginning the test. ' ♦ ^ ^ - 



Score one question at a time for all students, then proceed to the next 
question* This approach helps maintain consistency in scoring* 



• Write comments'^on each student's paper.^ Offer advice, criticism, 
praise, suggestions. Students appreciate such feedback. . ^ 

• If there are several sets of paper, "^shuffle them to avoid tlie tendency 
of downgrading earlier or later papers. Take rests and review the model; 
answers to prevent fatigue and altering standards. 

While selected response tests are usually the more ^efficient,, 'constructed 
response tests can also b6 excellent measures when the class is snjall or'ttie 
cognitive level of knowledge, to be tested is high. (The essaiy format is more 
often used in college and graduate school courses.) If the short answer' 
.essay test is carefully written an(J ^scored, it can be a reasonably effective 
''measurement tool. . - . . ^ 



. . . ^ Self-Quiz . . * 

Select th^. best answer. Write the letter corresponding to your choice in the 
space provided at the left. 



The major similarity between selected and constructed; response 

items is that - • ' 

both. yield high reliability estimates 

both are efficient with respect to administration and scoring 
neither is completely valid * : 

both are measures of school achievement * 

\ ' " 

of the. major difficulties wtth the essay item is that.it 
fails to obtain >esponsies. that differentiate among ^examinees 
oftenwfails to set uniform flhsjcs for all examinees 
seldom sanmles what is to be tested ^ 
suffers frbm the fact that e)taminee$ attempt to guess the 
correct response - 



a- 
b. 
c. 
d. 

One 
a., 
b. 
c. 
d. . 



V: 

t 



^22- 



ERIC 



3X) 



• ^3. In the scoring of essay, examinations, all of the following are 

considered desirable practices "except to . . 

,/ a. reduce the mark- for poor spelling or. penmanship . 

b. prepare a scoring key and standard' in advance 

c. remove or cover- pupri Is* names from the papers 

d. make individual comments on each student's paper ' 

Jhe greatest advantage of short answer essay tests over selected •* 

"response tests is . 

a; • the ease with which the test items cain be constructed 
. . . ^- the ease and accuracy with, which such tests can" be standardized 

c. ,the ease' with which the test results are i rfterpr^ted. 

d. the better sampling of "content . • ' . " 

For each of the following questions, write either'''T" fbr true or "f" for 
„fals^in the space provided at .the 'left. ' \ 

^ Tlie best procedure in scoring an essa^ test iS/tO'read all ques- 
' • °" student's paper before starting to read a' seconU. 

I iiirfMTULnt's paper. - 

- ■ . . - • , \ . . ■ • > . - 

i6» A one-hour essay test composed of three ' questions requiring ex- 
tended- answers is likely to be more accurate than a one-hour essay ' 
test composed of twelve <juestions permitting ifluch shorter answers." 

^7. To make essay test scores objective in meaning is to defeat the 

purpose of essay, testing.* ' 

^- Diffenent types of test items (essay, "Shbrt^answerl- true-false, 

mul^tiple choice,, etc.) must be used to test different levels of . 
*■ : cognitive behavior. . • ' 



r 

OUR ANSWERS 



ERIC . • ; : . , . 



J- I ' ■ * 



'GHAPTER^V:- duSSROOM TEST CONSTRUCTION/ 
"^ejLring Sjcills ^' 




For several decades the paper-an^-pencil test been the primary vehicle 
for measuring, educational growth;! it Is only receT»tl-ytha1;. a great dead of 
effort has gone Into developing performance-based tests> - Tbe essfence of the 
performance-based test is, that ifi a performance -or skill is. t6^ measured, 
there should be direct, not indirect/means of * measurement. For-exMiple, 
certain types of knowledge, are ess6nt1e(l to driving an autonjobjle, yet h^lng^ 
this, knowledge in no way guarantiees that one will be a skillful driverr^ 
Jleasurement of driving skill shoul'd involve actual driving-perfonnanCe tasks 
rather than knowledge-based paper-;and-pencii tests. The former Is- a. 'direct 
measure, the letter an\ indirect measure from whjch only weak inferences about 
driving skill can be/drawn. * To further illlistrate this point, the most 
direct way to determine students' skillin doing laboratory experiments "is to 
measure how well they perform a series of tasks essential to good laboratory 
technique. Determining knowledge of laboratory procedure does <not yield 
direct information about lab skills. - \_ 

The term "skill" i| general.ly applied to. behavior ' whfch involyes^pecific 
processes; nohitrh may or may not resul t- in products. - TheSenprocess'es~and/or- 
productis have Qualities or characteristics which can be' directly observed. 
Judging the quality of these observable characteristics is what iS' meant by. 
measuring skills. Learning outcomes" of' this kind are. best measured by means 
of perforyiance-based tests. In frfct, the graduation ^competencies required by 
Oregon's new minimiiin standards imply many outcomes'of this sort and neces-=-^ 
sitate' the use of appropriate measurement 'techniqures. , Thope described- irT* 
.this chapter include: ^observation, .rating scales, and checklists. 

In observation , one notes the presence or absence of an observable behavior; 
for example, a child tying a shoelac§, putting things. aWay, stacking blocks, 
or .completing a puzzle. These are simple behavi.ors, many in the psy':homotor 
domain: { . " ■ ; , . . 

Rating scales are numerical descriptions of behavior. The observer views a 
'performance or product- and records a 'numerical judgment on a rating scale. 
•For exaipple, when listening to ah oral reading of -Coleridge's ^'Rime of the 
Ancierit >1ariner," the teacher may rate the student's ihflection (vocal 
variety) on a scale from one to five; one representing J.^ttle inflection and 
five representing excel lent- inflection . * ' 

Checklists rfesemble observation, except a checklist ,fp<juses attention on 
sequences of related behaviors. A sequence might include assembling equip- 
ment or performing- a series of tasks, v^'' • H * ' , 



Observation ^ > . ' \ ' 

There are many instances 'where stfple observation can be used to indicate 
-^whetheKor not an o^lit^jve has been achiey|d;N for example? 



Th^ student can: . _ ^ , * 

' !• dress self; ' j ^ \ ' - 

2. run*200 yards in less than bne/minute;> . 

3. .perform orally atl multiplications involving one-digiit numbers and two 
Jactors; ' ' ' ' 

read any paragraph orally witk^no more than two' errors; \ 
--^^ ' - » * ^ 

S^^-spell all words on the Oolch reading-^list without erfor. 

.~ ^ ~' — .\ ^ ' '■ . . • . ■ ^ ^' 

In performance-based tests of the&e tasks, the teacher notes whether or not 
the behavior has been demonstrated; there is. no inference' of Itnowledge beyond 
that, which 1j> Stated. TKe reading «xaml)nB call s only .for reading, not 

•comprehefision; jthe s'pellinf- example refers only^p the specified list, not to 
other words on other^lists. Therefoi^e, the desired outcomes can be directly 
observed,. _ ' , .. .,i ' ■ 

Observation is the simplest, most direct', measurement technique known. 
It sHould be used whenever appropriate because of its- ^high degree^of 
reliability. ><, 

Rating Scales . \' ' = . ; 

If the quality or degree of skill achievement is to be measured, the rating 
scale is us&ful becadse it is accurate and efficient'. 

Advantages / . , 1 

+ simple to use , - ^» 

+ easy to^ijnterpret 

+ requires test-maker 4:o clearly .define that which is being measured 
' Disadvantages • . . * ' 1 



9 



sub'iect to lack of agreement among raters 

- may be time-consuming to administer . 

- usually involves inferences 



While raters tend. to differ in judgments, they can learn t9 increase preci- 
fsian- in rating., and henpe, agreement. Identifying' what is -ta ^ ^-ated is 

perhaps -the most important task in/developing, a' rating scaled Simply creatr- 

ing scales to rate sometfiing as undefined as "creativity" or "motivation" fs"^ 
"a dangerous practice; 'many interpretations can arise when a riiter is examinr 

ing a related performance or product. '• ■ . ^ 

The .rating of traits usually calls for the exercise of judgment on the part 
of the observer. How well can a student "adjust scientific eouipnient? How 
effective are the techniques used by a teacher to reinforce a poorly achiev- 
ing student when homework is completed? These questions call for. subjective 
judgments regarding a performance or. product. . ^ " 

.Four steps are recommended in constructing a good rating scale:^ . 

!• Destribe clearly the traitCs) to be rated in the performance or pro^^- 
d"ct > For exafmple, when rating students in a woodwork1ng->5!U5s on safe 
handling of'^tooU, the word "safe" is not enough, A iftore adequate 
description might refer to using specific equipment, 'cleaning up aftef 
^ class, and following rules* 4^ \ 

2. Create a scale f6r each characteristic* , One pf the most popular and 
^- useful types of sc,ales is the graphic scale* For example: 



How often do students clean up adequately? 



frequently^ 



about half 
the time 



sel dom* 



■ Here the rater simply marks, the category that best fits the .p'erf orliiance. 



1" ^ ■ ■ • I ■ 

The following material has been aaopted from Tenbrink (1974). For a more 

thorough treatment of techniques to measure performances and products, see- 
Tenbrink, pages 273-293. 



In other Instances, a' verbal descriptive rating scale is' used: 

* ■ — • 

How well are color's used in the" batik? • . ^ 

a." -very well b. well c. avferage d. poorly e. .very poorly 

" ;__s ; 

1 " ' ] T ~ 

Table 3 presents a variety of response options,, each based on a five- 
point or a~ three-point scale. A five-point scale Is generally pre>- 
ferred, although three-joint scaled, seveh-po\"nt scales, ahd even 
ten-point scales have been used effectively. Three-point scales usually 
do not c;apture the entii^e-r'ange of a trait while the larger point, scales 
jcall for too f.ine descri mi nation by the rater./ v • . 

» . . TABLE 3 ■ • - '. , , 

Examples ot Various Types" of Rating Scales " ' ■ » 



* . SIMPLE NUMERICAL ^ . » " - " ^ 

-Rate the following u^ing this 5cale: . ' 

1 = excellent, 2»= good 3 = -average 4 = poor 5.= very poor '■^ 

/ . a ttention span in cl^s ' 

a ble to follow directions ■ ... 

J — - — — — — . — — — — — — - - — 

/ SIMPLE GRAf^HIC SCALES . 

Our Textboo>,: ID^-'like □ indifferent O -dislike 

\ • £* ^ much has the performance changed stpc^ the last time? 

A. much better B. better C. about the D. worse E. much 
• . • same worse 

^ In terms of orfginality, how would. you rate 'the essay you , 

have read? - ■ ' . • 

./ . , . . , • 

A,- high B. average C. low 



Ji How well did the stu'denf' read, the jjassage? 



Hi"' 



,A. very well B. well C. as well* D. poor * E. very- poqrly 

(average^ ^ J. 



; H6w often did the student^correctly use the wrkbook? 
' A. very often B. often C. sometlmesT 0. seldom E. never 
Jo what degree doe? the exhibit contdin detail? . 
A? very. mu6h B. much C. some- D. little E. very little- 



The performance Ucked volume. - . 

A", strongly B. agree C. neither D. disagree E. strongly 
agrqe agree nor * ''disagree 

disagree ' 



DESCRIPTIVE SCALE 

not meeting - 
the requirements 



□ 



. 4 



fair but needs satisfactory. doing good 
improving " ^ " work 



^ excellent job ^ 



develops paragraph^ 
quite well with clear- 
cut topic sentence 



□ 



9bout average 
deveiopqient of 
paragraph 



doe*s not deve.lop paragraph^. 
. well,, frequently lacking ' 
topic sentence and adequate 
deJFinitions 



r. 



3. Arrangement? of the scales on a form « Generally scales are arranged in 
^ ' three ways: . '■ ~ ' - • • . 



ti>e:' 



Positive.'to "negative 



l.- 
excellent. good 



• b. Negatji^iip to positive 



<, average 



—I- 

below 
average 



poor 



v.. . 



, — ^«..««average- 

c. ' Strong to neut5;al to strong, ' 



poor- 



•excellent 



very high rhigh —neutral- — :-iow-fi.~--very low 



4. 



Writing ins true t'ions » For consistency in later, use by yourself or 
others, write clear-cut instructions ftor administering the scale: 

a. ' describe what is being rated and why; „ • • 

b. • describe h^w each scale is to be marked; and 

c. inblude -spectal d5rettions»' such as whether or not -to add up scores 



• or 



make ad[ditional comments. 



l-W examplfe: • • . . ' ' " . ' ' 

In \he/ school .science fair, all exhibits in the area of earth science 
are being judged in terms of ftve' criterW. Complete one of these forms 
, for e^'cti exhibit. ' ^ . . . . , . 

Cfrcli the number. corresponding to the description that most 'accurately 
describes the quality of the exhibit. At the bottom of the page, sum 
your /ratings and place the tgtal in the box at the bottom. 



r 



f , 



. / 



ERIC 



-30- . 



Factors Affectlfng.the ReHablHty 6f Ratings 

\ ^ ~ • • r -. 

Once, a trait lis clearly described and a scale established, the rating' should 
be a valid reflection of student acfiievemerrt. However, the foil owing. factbrs 
•ay interfere wi tH the usefulness of the rating scale: • , - 



• • . Lack 'of interest : if the rater is bored with the task, the results 

will reflect this.^ ^ 1 

♦ » 

• Personal bias interferes with' judgment and results are liable'ta be 
^distorted. . 

*'■» -, ; 

• Extretne optio'ns (e.g., never, always) seldom provide useful infor- 
wation.* If a five-point scale contains two, absolutes th^ will 

rarely be selected. . . 

' •. ' ' ■ ■ 

• Lack of clarity about what is befng ratkd will cause erratic 
results. . \ \ ^ 

• Generosity ; there is a tendency to overrate when, scales are used. 

• - Hajo ; there is a tendency to give a global rating to a person' and 

make ratijigs of sub-tasks correspond with this rating. Again, this 
distort' the results. 

t Interaction errors * When a panel of -judges makes independent' 
ratings, high and low scores can be omitted and the remainder 
averaged.: 



Checklists 



As noted earlier, the measurement of many skills does npt require a rating. 
The presence or absence of a desired outcolfle could be observed or, if there 
is a need to measure, whether or not steps leading to an outcome have- been 

achieved, a checklist would be more appropriate. 

.... ^ ^ . • 



EXAMPLE: * ■ * 

Brewing a pot of coffee:^ " , ■ 

disconnect coffee pot fill basket with coffee 

i3 disassemble coffee pot * reconnect coffee pot 

■ clean pot , set dial on pot 

fill pot with water . check to see if perking 



2 • ' ' * 

Adapted from Kager's Measuring Instructional Intent . 1974, p. 11. 



EXERCISE ' ^ 



•Ten piJtendal checklist* i'tems are' provided below. Mace an "xr in the space 
if the item is visible and easily demonstrated. The student: 



.1. reads a passage without making the following typ^ of errors . . ► 

^ ■ " '■ . ' 

2." correctly punctuates contractions, words requiring-apostrobhes, and ^ 
plural possessives ' ^ ^ , . ' . ^--^^^-f^^.**.. 



_3. is -happy .• - \ ' \ • ■ _ ' . /j 



4. is independent • * J. 

J., 'performs each of twelve simple life skills ^ 
6. puts all equipment away in correct places after experiment 



7. can locate each of. the foll/iwing types' of symbols' on any standard, 
map using. the legend ..." . * * 

j8. is adjusted to classroom ^ 

9. fotl'ows five steps in using a platfonn balance . . ; 

id. has mastered each of the five word attack skills in this unit , 



If you checked 1, 2,' 5, 6', 7, 9, and 10, you have c6rrectly identified 
outcomes appropfiate for checklists. '-^ ' . 

^ ^ - , .• , ' 

ConsWting- checklists is similar* to making .rating scales. Be sure youc - 

1. ' Describe the product or performance adequately. " , 

• ■ •' f'^ " ' ' ' ' 

2. -List the behaviors to be observed-in correct sequence. 

3. ■ Note errors which may occur' in the performance. 

• - . ■ • ' ■ "* ' « 

4. Give clear di'rectibns about ihow to use the checklist, • >. 

' ■ ■ • • ' ' ' . , * . • •■■ . * 

The checklist is mQst applicable to simple,' observible perfdrmances. It is 
easy to develop atid. use,- and i the reliability d^n^e high. Checklists are^ 
becoming increasingly popular. Soite schooT districts are even reporting 

."student progress to parents |in terms of what students can or cannot do, 

' rather than using 'the tradttionaT report card. 



. SELF-QIIJZ 



Mark I'O" .if appropriate for observation 
•*"R" if appropriate" for rating scales 
"C- ,if appropriate for checklists 



3 



l.-'iHow well can Oascha play th^ violin? 



Z. Has Andrea been i a school everyday this term? 



j3. Has George completed all seven steps correctly in the experiment? 



^4. . Can,(Jary "foTTpw all 12 steps in correctly reassembling a. simple 
. motor? , / : \ • 

^5.. Hc/w; much .improvenjent ia soccer has Pele shown, in PU. this year?. 



Select the t>est answer. Write the letter corresponding to your choice in the 
space provided at the left. • 

6. From the standpoint of construction, which technique^-s'mpst liable 

• to offer the highest efficiency? .^-^t. 

" a» rating scales - • ^ . 

5 checklists ' . ' _ - ^ . . 

; w-^^.c.. observation^, , ; \ <t 

dt^v .n6ne of these s ^ * ' 



7. Which requires judgment by the obse'rver? 
ai; rating scales " , ' 
B. ^observatioD' ' ^ ^^v" 

^* ' • c. checkl ists \ \ 

: none of these - . ' \ * 

8. "A checkl isf must 

a. be administered by more than onie person 

i»bi refer to intended behavior . .. 

c. lead to a numerical result 

d. have a logical s,e'quence of behaviors to be checked 



t 



9. Observationj's to checklist as > 
jL. direct is Xo indirect 

b. simple ,behavior is to. sequential behavior 

c. inference is. to induction 

d. deduction is to itemi^zing • . 



*^ V , Mark "+" if i good pcactice and "0" if not, ^n the use of observation scales, 
*^ rating scales, or checklists. . ^ . ' * ' 

* TO. First, Mr. Re6 makes, an 'overall rating of his ^rama class, then 

; ! . "■ "uses that rating for e^ch aspeqt af student perfonpance. 

. - ■ ■ ■ ' ' . • ' • 

^11. Marion Polk believes that all history exhibits should be judged by 

^ at least three persons. ' * ' • 

i 12. Counselor Guy Nice, when judging i'or science fairs, insists that 

■ .~ *~ . "good. intentions" be- considered in assigning ratings. * 

' * ^ ' ■ ' 

. _^ 13. Polly Gohn uses a checkTist for each solution to a proof to see* if 

- ' ^ all steps are followed. • ". ,■ 

^ ^14.- Atlas.' Shrug, the P.E. teacher, uses a.-rftting scale to assess the 

degree to which he has achieved his goals, each, tertn. 



OUR ANSWERS 



+ 'n y 'L 

+ -ex 3 :9 

0 'ZT . y *9 

0 'OX 3-*e 

9 *6 . 0 '2 

a -.8 y *T." 



v CHAPTER V:, MEASURING ATTITUDES 

Attitude can be described in several ways: it/is a tendeiky to tfct in 
.certain ways under certain conditions; it is also viewed as tendency to 
react efnotionally, either positively or negatively. We infer the attitudes 
*'of people from what tfiey ^ or do. In this.chapter, we will look at /student 
attitudes about school, subject matter, palicies, practices, environment, and 
other ^parts of their educational .program/ _ ^ • . . 

. Teachers a r^ interested in ^attitude for two reasons. Students' perceptions 
about school have much to do with what they Jearn; subject matter that is' 
disliked- is not easily;; absorbed. .Educatca:s have begu>i ,to lool^ at positive 
attitudes as an important school outcome. Since students iSpend a good 
portion of their Qhildhood and* adolescence in schools, the school environment 
should ^ seen by them as generally constructive. * \ 

Approaches J;o Measuring Attitudes . ^ 

TWO methods ai^ genera^^^y used to measure stQdent attitudes: self-report 
by the student and observation of the student 'by others* With the seTT^ 
report s the student- is asked to tell howhe-^or she feels. R'esponses can be 
elicited through interviews or questionnaires of various twpe** The obser- 
vation method requires the teacher or some other person, to' Q^erve the 
student's. behavior. With the anecdotal approach, essay reports are^usually 
employed to' describe s^^nt behavior, in a fwrti.cular incident. 

Two self-report methods are used to learn how students .feel: they can be 
inter.viewed ;» usually one at a' time, or, they can complete a questionnaire^ 
Interviews may not be very efficient in terms of timle 'required or costs. ^ 
more seh'ous drawback is that intervic^rs must be carefully trained in 
interview techniques.-if. reliable information is to be obtained. 

The questionnaire Is more coninonly used. The teache/ asks questions of 
students as^ a gfbup, and they .write their responses w Two assumptions are 
necessary: the questions «i11 mean the same thing to all respondents,, and 
all respondents wJU answer the questions honVs^tly. • 

• - ^ . . ' \ . . 

Advantages ^ ^ • 

^ . * /' 

. + can be done anonymously- 

+ a simple and direct* wfl(y to gain information of a systematic andl 
quan ti f idbl e nature ' 1, 
> ' ^ ' ' ^, * * ' • ' < 

+ most reliable when raters are qualified / 



Advantages - (Continued) . " 

+ most, valid In Instances where objects to be rated ie.g.j^ textbooks^ 
classroom topics) -a^^e welT-deflned • - ' # 

+ ' can be gIveVt to groups, of students Instead of one student at a^ 
time ' , • ' ^ " 

* Disadvantages \ • v 

can be difficult to consStruct and analyze 

can be time-consuming to analyze ' ^ * - " 

• - , mdiy be subject to dishonesty by students 

- . cannot Identify how particular Individuals .jfeel because of 'the 
"anonymity ' ' . * S . 

While, a.great deal of time ts spent constructing useful items for the self- 
report questionnaire, the Instrument can.be used in a variety of situations 
to obtain individual or group information. , ^ 

The.^ self-report scales most commonly* used are the rating scale, and pair 
comparisons. Thp rating scale is the most direct. Th6 precision of a" rating 
scale tends^ to increase with the "number of scale points provided; the most 
practical number of ^cale points seems to be -five. An odd number of scale 
points (e.g., five or seven) allows for ? "neutral" point which many rBspon-. 
dents find useful. Without a neutr^al point, respondents^ are "forced" to 
respond in a definite direction, which may misrepresent .their views to some 
extent. For example: - , i 



How do you feel when it is' time to leave school? 

a. very happy (b^ happy c neutral d. sad- e. very, sad 

, / ^ 

One successful variation of the rating scale for elementary school students 
^is the--us€~^-of symbols for the response options. The option^ are reduced to 
three and represented in the followiiig manner: 



- How do you feel wheq it is time to leave school? 



studies, by the Teaching Research Division, Monmouth, Oregon, confirm this- 
method as .an effective means of ascertaining children's attitudes as early as 
the fir^ grade. ' ' ' 



RaWng^scale items can be' used for a variety of purposes, such' as assessing 
perceived learning effects, feelings, "strength of feelings,' or agreement- 
disagreement. Some examples:* • 



i 



1. 



2. 



3. 



What effect (e.g., on ^our learning) does (something) have on you? 
a. very much * (b) much c. moderate^ di little . e. very little , 
How do you feel about (something)? ' ; , ' * 



a. very b. satisfied c. neutral 

• satisfied' 



^di 



ssatisfied e. very sat- 
isfied . 



To what extent do y6u agree with (a statement of ^opinion,)? 

a. strongly fbijagree c. neither agree d. disagree e. strongly 
agree v— ' . nor disagree disagree 



The pair comparison is effective when measuring th0 strength, of student 
preferences between Dejects, activities, or"" subject matter areas.' This 
technique helps establish |an t)rder of expressed attitudes. The example below 
seeks to determine preference of math versus reading. 



Whlch-wdulxLyou rather, do? 



□ a. read a- boojc ' OR. [^I b. work math problems 
fv/T a. do math homework OR b. do reading homework 



with eitbef type of f ormat,*- the following grecommendations^^ apply when con- 
structing the instrument:* * * - . ' r 

l-."? Use phraies that are simply, and un'dfer^tandable to the students. _ 

2. Phrase -Statements so that som.e are clearly negatfve and some a«ne clearly 
•• . posftive,_wi.th approximately the same number of each,. 

3. - Keep all i'tems relevan,t; avoid niaking ^the questionnaire over-long. 

4. In using the pair comparison-technicfue, be sure that choices include all 
possible combina,tions. 



»\ "'Vv' 



Scoring and Analyzing Results - ^ . ' - 

The rating scale is generally scored witn number valuesT b(5,ing .assigned to 
each of tie , opti on^'. If an item is reversed (that is; Jtatetf negatively 
instead of positively), the number scale 'is reversed. Thfe attitude measure 
is-the sum of the values of this responses. . In the fo](lowi:4ig example, student 
attitude toward physical edilcation is measured^. ^ -^f^fF , .. 



Circle .the best answer for you. . • ■ ' 

U How do you'-feel when P.E. begins?, (positively statedf" 

happy b/' neutral - c. sacf - 

' 3 2 - , , I ■ . 

2. How do you feel when you are in P.E7? (positively stated) « 

■'a. sad ^- b. neutral-,', f ^ ha^y ' 

3. .How do you feel* when P.E. is over? (negatively stated) 

a. ' happy b^ neutral ^sad 

1 , 2 . .• , V,3 

4. How woyld you feel if you never hl;d to go to P.E. again? (negatively 

stated) \ 

.. ■ ■• • \ 

■ siad b. neutral c. happy \ 

' ' 3 ' . ' . 2 ■ . « 1 ' . , 

SCORE; • 12 (The score of 12 i^s the highesV possible; 4 wx)uld be the 
lowest possible.) > 



The. pair comparison technique requires- a different scoring fonnula. Consider 
'the example below ora question»>aire administered to a teacher. 



Mark the bo)rindicating which activity you prefer: 
1. going to school - OR- □ going to a faculty meeting 

>2.- [2 attending workshops • OR collecting m'ilk and lunch money 

3. □ going to school • . OR,. .attending workshops 
4* D going to a faculty meeting. OR '[jg collecting mil k and lunch ifioney 

5. attending wortcshops . OR; going to a faculty meeting • . 

6. 02 going 'to school OR ■ □ collecting milk and lunch money 
Responses are tallied and four attitude scales are constructed. ' 

. Preference ' Frequency ' ' Score . * 

going to school. ' . If . z 

going to a faculty meeting , '. . .. 0.' . ' " " 

.attending workshops - • /H ' ^ ^ 3 » ' ' , . 

collecting milk and lunch money ■ / i ' 

For this simple attitude survey, the scale for each activity runs from zero 
to .three. This teacher likes going to workshops, enjoys going to school 
next, and'likes_ faculty meetings the leas^t of the four activities. The pair 
confparison approach is particularly useful for revealing relative differences 
in attitude. * " ■ 

I * 

Observation Methods ' • . ' 

As described earlier in the chapter- on measuring skills, observation methods 
are direct, and the advantages in measifri^ng attitudes rfre similar to those 
for measuring skills. The^ observation approach requires little time for 
construction or tabulation. Attitudes reflected by the general day-to-daM 
behaviors of students can be observed; a^ttention can also bfe focused on 
critical incidents and observations made of behaviors in stress situations. 



ERIC 



General Observations 



One way to classify behavior that reflects attitude is" in terms, of approach ,p>" 
or avoidance. If ja feeling toward something is positive, one tends to-*>. - 
seek 'it; when feelings are negative, one tends to avoid It. The following -. 
observations illustrate approach^ and avoidance for one student in a school 
setting. . ' ^ * • • » 



1. . According 'to ''his mother, Jason'" spends most of his" time each evening 

reading new bodks frvm his school book club. 

) , ^ '• . ' • 

2. ■» During recesses, Jason goes td.the library to re^.. 

3. At P..E., Jason often says he doesn't feel well. , ' • 

4. After school most of the kid^ stay on the playground to play; Jason. 
• -iisually goes home. • . ^ • 

5. During free time each day, Jason goes to the li4)rary. .- 

• • ' . .:<>.• . ■ . . • 

6. Jason has ^ordered more new books from his school book club than any , 
other student. " . - 

7.. . A survey of attendance shows that Jason never misses morning sessions of ; 
school, and frequently misses afternoon sessions.' .. • ■ 

• i * ' t ' 

4 * " 

9 ' * , . 

Approach and avoidance behaviors are easy to observe, systematic in nature, 
and reflect Jason's preferences. They are simple observations requiring no 
inference. The sum of these observations* however, may be used to draw 
inferences abput Jason's attitudtes toward reading and P.E. 

"Observation, cari also be .used for group measures;.^ Mager (^^^4, p. 89) .offers • 
' some attitude measures that fail into this category, some of which are listed 
below. . > ■ 



1. .Percentage of students completing the course. . " . r 

2. Number of students volunteering to transfer into someone else's cl^ss. 

3. Number of papers or* projects longeV than require^. 

4. - Number of ass*ignments completed on time. 

5. Frequency of use of a particular learning center. 

•if-. * t. 



6, Number of s.tudents volunteering to sta^ after school to help^ 

7, . Number o^Slt>rary books and other material s*checked out. on the subject. 
.8. Increase or decrease school vandalism and petty theft.. 

These. are a few of the many possible behavioral measures that teachers can 
use to estimate how a teaching stra^tegy, program, new technique, materials, 
or policies may be. affecting student ^.ttitudes. . . 

Many of these ^measures are "unobtrusive" in. that they do not. Interfere 'with 
student activities/i)r cl^ss time^-* and students m^y not be aware that the 
measures are-- being taken. Information about students who are sensitive to 
being observed is less likely yto be "faked" or biased when unobtrusive 
measures are used. While observation methods have the attractive quality of 
being simple and direct, they can be time-consuming and demanding iqin'the. 
teacher. /For those wishing to further pursue attitude measuretrtent along 
these lines, consult Mager's Developing Attitude Toward Learning (see Appen- 
dix. A). . P ' = 



Critical Incidents 



No matter how carefully -the* scHool activities are managed, evei*y student 
experiences stress situations from time to time which we could call "critical 
incidents.." Careful observation* of these situations can often tead to very 
useful .insights into student attitudes. The method differs from the general 
observation approach; behaviors are observed and interpreted ' within the 
context of t^e situation in which they occurred. 

For example, general observation might indicate that Jeff loses his temper in 
•class at least once a day. Knowledge of this may let the teacher know that 
•Jeff needs to learn how to control his temper better. Careful observation of 
■each incident m^y suggest some good ways to help him. " . ■ 

■ ' " • ' . • ' " ' 

There are. some rules to follow in recording useful^observations of this kind: 

Care must te taken to record in sequence tlie facts of the situation 
separately from any interpretation (e.g., Jeff struck: John in the 
face with his fist is fact; to say Jeff was angry, when he did it is 
an interpretation). ^ ■. . • . 

• .The , conditions and behaviors which .led up to the incident (the 
antecedents) must be carefully recalled and noted* , 

f - . ■ ^ • • ■ 

• The "observer's (teacher's) response's to thp situation must be 
. • recorded (again, facts only). ' 



• The, outcome of the situation- roust be factually recorded. 

t Interpretations should be recorded separately and, based on a 
thoughtfu^l and logical analysis of this particular, situation, Us 
antecedents, observer (teacher) response, and the outctxne. , 

Periodic review of 'accumulated description^ may begin to reveal patterns^ 
of conditions 'under which Jeff 'loses his temper and conditions unde.r which he 
is ablevto control it. A general format might 1 ook something like tne 
following: ^ ' . • ... 



De'scription'of Situation ' • . Interpretative Comment 



■Antecedents: 
Incident:- 
Teacher Response: 

Outcome: , 

~ — — ri ~' ~~ 

Caution is' warranted. The cr.i\tical incident method- is'of Value only to the 
-oerson using it. It is desighed to help».te,achers separate the factsj)f a 
situation from the gteotional impact it can Have on them. Anj^ actualrtCord- 
inq of descriptions should be viewed only as the teacher's own personal .notes 
for sorting out and' creating cb'ndit'ions that can foster constructive student 
attitudes and behaviors. As soon as they have served this purpose, they 
should be destroyed. * ; . . 



... Self-Quiz > _ ^ 

• • - \ • ^ * ■ . . ' ^ . ' " 

Select the best answer. Write the' letter corresponding to your choice in the 
blank ^provided at the left. 

^1. Which one of the^^fol lowing best exemplifies an attitude? 

a. choosing math over reading 

b. - losing your temper when disappointed 

c. ignoring a best friend when unhappy . ^ ■ ' 

d. being happy-go-lucky ' 

J VV * 

^ 2. ..Student attitudes • are Important to the teacher J)ecause they are' 

. a. predictive of success in schoal 

b. important school outcomes \ .\ ,\ 

c. both a 'and b ' ' * ^ 
d* neither a nof b J ' / 

^3. Which one of the following is a "major 'disadvantage of a self-report? 

'a. ' lack$ precision * . ^ 

b. tendency of %om^ students to fake ajiswelrs ^ % . ' 
, c. time-consuminp to construct the instrument * 

d. is* gcoup-admiTiistered \ * , 

* ' > * * , 

4 . A |3air comparison. technique is useful^for- 

a. making comparisons between two objects* • 

b» . a self-report, when rating scale is -inappropriate - 

c. finding thfe ""strength of a preference ' ' • 

d. ^ det<?rn\ining the order of preferences for a set of objects 

_ . \ * ' . ' , ^ ' 

\' 5 .* Which of the following is -generally false?; 

a. Self-reports are more desirably than interviews from the 
standpoints of efficiency; . ^ * 
' * b. A pair comparison is a more indireq^t measure than a rating" 
. * scale! " * ' 

c. Training of raters jjnprqves the precision of r'atings/ . . 
^ d. Observations are typically very objective measur*es. ' ' 

6 . Critical incident observattons are well-done when * / 

a. the record contains some impressionistic informati'pn 

b. a complete description of a series of. events and the student 
is.^done ^ * * * ' 

c. •psychologic^ interpretations of actions are made ^ - 
^ d. observed behavior of a .student'^ is described and interpreted 

from .beginning to final outcome. ^ 



OUR ANSWERS 



' APPEND IX, A" 

r 

ANNOTATEd BIBLIOGRAPHY . f 



•The 'following publications represent*^ sampling" of -recent and. significant 
contributions to the technology of testing. \ ^ 

Baker, E- Lv, and* Popham.'W. J. Expanding Dimenstons of Instructional 
Objekives . Inglewood Cliffs, Prentice-Hair, 1973. This brief text 
describes the role of objectives in education^ the use of needs asses's- 
' • ment 'in.x»\oosing;goals, a/id some affective objectives, wh'ting tests to 
measure, objectives is also discussed. . •' . ^ " 



Block, Jj. H. (Ed.) Mastery Learning; Theory and Practice . Ndw Vork, Holt, 
Rinehart, and Winston, 1971* A thorough review' of studies attesting to- 
«; .the bfenefits and deficits of mastery learning approaches to instruction. 

' • ■ , ' . ■ ^ -■ ■ 

Bloom, B. S. Learning 'for Mastery. Evaluation Comment . 1968, k. 1-12. One 
of 'the earliest and most cogent writings,on mastery >earnini in" American 
education. ' ■ • •■ ■ * - 

Dizney, H.^ . Classroom Evaluation for Teachers ; Dftbuque,* William G. Brown, 
■ 1971-.- A brief basid text on achievement tes'ting^ whi^ch covers the 
,f fundamentals .of testing. • i ' 

• . . - y I ■ ' 

Ebel^ R* Essentials of Epucational Me^^urememt . Englewood Cliffs, Prentice- 
Hall, T^72r^~TTtand^^ in constructing and using 
teacher-majde tests./ The chaptec on true-false testing/ js onei of the 
best treatments of the subject. / ' . \ 

Grjanlund, N. E. ImproVAng, Marking, and Reporting in ,ClassriDom Instruction . 
Riverside, Macmilliin, 1974. This is a. very practical little booklet 
which CQntains_iji|ormation about grading practices; systejns, and guide- 
lines for collecting and using data 'to rpake criterion-referenced and 
norm- referenced decisions. ' . ' * ' 

Gronlund, N. E. Individualizing Classroom, Instruction . Riverside, Macmillan, 
1974. This booklet, one of a series by the author, /describes the' role 
of testing, in /individualized^instruction. He describes several cur- 
rently operating systems whip are individualized. 

Gronlundi E. Preparing Criterion-Referenced Jests fbr'Classroom Instruc- 
tion . Riverside, Macmillan, 1973. This ,book is /one of the' few that 
attempts to /describe how criterion-referpnCed tests are actually con- 



structed ana used. 



Hively, W. Introduction, to dooiain- referenced testing; Educational Tech- 
nology , 1^74, 14^ This is one of- the Inost, readable accounts of 
domain-referenced testing as it presently- exists. • / ./ 



. ' • ANNOTATED, bibliography; Continued 

■ ... . • 

Keller, F. S. ' Goodbyfe, teacher . . . , Oourfial of Applied Behavior Ana1y» 
sis," 1968,. 1, 79-89. ..An account of l:red Keller's Personalized Student 
TnstructionlPSiy, one which rfiotivated a movement in college instruction 
away^ from .traditional ways toward competfency-based approaches. 

K-tWen,"R.. Barker, L. L.. and Miles, D. T, Objectives fo r Instruction and 
' Evaluation , Boston, Al.lyij and Bacon,^ 1974^ This book provides the 

• rationale for the useVof behavioral objectives in systematic. instruc- 
• ;tion. It also contarfes-^nfomation. on the selection, writing, and 

.appcoprtate usestjof-tiBjectives. in teaching. 

Mager, R. F. Developing Attitude Toward Learning , Belmont, Fearon Inc., 
1968. If ever a book were written for- teachers thjt^ pVeseiits the' 
reasons fjjr being concerned about children's attitudes and" how to 
. measure if, this U the on*." The book is both relevant and enjoyable to 
read. ' ■ - , " . . ■ • 

Mager,* R. F. Goal Analysis , Belmont, Fearon Inc., 1972. ; In- Mager' s enteK 

• taining'Way, he describes appropriate methods to analyze the [nature and 
■ demands of any goal. The applicability of this book to education is not 

apparent./ But there are implications that need to be realized. 

Mager, ft. F./ Measuring Instructional Intent ," Belmont,- Fearon. Inc., 1973. 
v.^' A.xommon sense approach Is taken by-^ager in this book to drafting test 
\. item's w/iich directly, reflect your instructional intent; Sonfe important 

- x^istinc^'tions are drawn between norm- and criterion-referenced tests. 

•Mille»\Hr/^Will^ R., and Haladyne, T. Beyond Facts ; Objectively Mea- 
suring Higher Level Thinking , in press. The authors present and develop 
. a system for writing multiple choice test questions that measure higher 
"leve,! thinking. Examples, exercises, and- self-quizzes ^are used, to 
.• develop the' item viriier'j^skills, , 

Popham,;!^.' J., and^aker, E.l.' Establishing Instructional Goals , Englewood 
Cliffs, Prentice-Hall, 1970. This brief text describes; the- role o,f. 
olyjectives in systematic instruction. It is replete with, examples 'and 

- ' ei(ercises. . " i 

Popham, W. J-.i and Baker, E. L^ Planning an Instructional Sequence , Engle- 
wood Cliffs, Prentice-Hall, 1570: This book describes the way objec- 
tives are used in instruction. Like others in the series, it is^rich 
. wjth'^xamples and exercises. 

'. ' ' ^ ' , ' ' ' 

Popham, W. J., and Baker, E". i.. Systematic Instruction , 'Englewood Cliffs, 
Prentice-HaU, 1970. This book, from a series by the two authors, 
represents ' an ove'rall look at modern instructional approaches w|)ich 
require' fhe ujse -of objectives and objective-based tests. . 



ANNOTATED BIBLIOGRAPHY ,-^ontinued 



Pophain, W. J., and Husek, .T.^ R. Iniplicati.ons of cHterion-referenced mea- 
surebent: Journal of Educational Measurement . 1969, .6, 1-9. A more 
. technical tneatment on criterion-referenced tfstind, one of the earliest 
^ and most authoritative. '. ' 

Sanders, N'. Olassroom Questions . Scranton, Harper and Row, 'l966. This book 
IS devoted- to developing the Jdea that te-.t items' carr b*e constructed to 
•represent levels of Bloom's cognitive taxonomy. The book has many 
■ examples. , / * , . ' ^ 

■ . , i:/. - , : ' .... 

Southwest*1legional Laboratory forJEducational -R'esearch and 'Oeve.lopment, Edu- 
cational .Cri terion Measures . Cincinnati, Van Nostrand,. 1971. This short, 
booklet .IS one. of, seven in a series called -Instructional Product Devel- 
ppment. It is useful in suggesting a wide vat'iety of n6npaper-and- 

■ pencil measures of school behavior. •. - ' 

w -^_» Stating Educational Outcomes . Cincinnati, V^n Nostrand, 1971. 

'^aother booklet in the Instructional Product DevelopTOnt senes which . 
de^scr-ibes how to effectively write or select objective. It .comes 
» comp;|ete with Objectives, examples, and exercises. . 



TenBnnk, T. D. . Evaluation; A Practical Guide for Teachers . New York, 
Hefiraw-Hilil* 1974. ".This book, a useful general guide for evaluation,* 

. ' contains a very concise and practical ,treatmfent of questionnaire devel- 
opment* and sociometrio, instrument u^e. .fh^ book also deals' with the 
problems of constructing teapher-made* tests.' ', ^ . ' 

Webb, E. J., Campbell,. D.' T., Schwartz, R. D.', and ,5echres, T.' L. Unobtru- 
^ . sive Measures . Chicago, Rand McNally, 1966. NoVfel' methods for gaining 
information for a wide variety of purposed .is -discussed in. this book in 
a most entertaining and interesting manner, .■ . 



• . • . ■ APPENDIX, B 

^ ^ . • • . - 

V ' - ' . ^ • GLOSSARY OF TERMS - ' , . 

'-••*•. • ^ . •, V ' ' • -. . • 

: • ■'. ■ . , • ' ■ • '• • •■ 

AFFECTIVE DOMAIN: V One of three aspects of human behavior, dealing with 

• attitudes, values, sentiments, feelings, personal ity^ and other similar 
.-concepts. Cognitive behavior may be importantly related to affective 
\bfhavi*or, -but the concern in measuring affective, behavior is that of 

- . asp^rtainingnhe^de'gree of attitude, value, etc. : ' ■' 

' ' : . ' / ^ ' - " ' ' " ' ' ' ' 

ATTITUDES: 'A tendency to react emotipnaliy to ah object in a positive, 
.neutral,, or negative way. ^ " 

CHECKLISTS: .A device foV noting the presence or absence of behaviors which 
;are sequentially related or conceptually organized. ' Used to study 
;skins in 'the cognitive domain and aspects of affective behavior' and 
psychomotor behavior. — * • ; , , 

•CLASSROOM ACHIEVEMENT T^ST: Any 'test expressly - designed . to measure the 
learning t|iat has; occurred as. a direct result of classroom instruction; 

COGHHIVf* DOMAlft: One of three aspects of human behavior, dealing with 
' - intiellectual activities. ' - . ' . 

■CONSTRUCTED RESPONSE TEST: Attest deliberately constructed so>.that the 
student must compose the answer. Used primarily-to measure achievement * 
-V of knowledge in the- cognitive ^domain. ^. : , > /■ 

DOMAIN-^REFERENCED TEST: Any test constcuctedi^by-samp^liiig-from a collection 
(pool) of litems related to a specific content domain. . ' . 

EEEICIENCY: A test characteristic. A. test is highly, efficient if it is 
simple and inexpiensive to plan, .construct, administer, score, and' 

• - report. .' • . ' . ■ , ■ ' . • 

• ,. * " . ■ ' ■ ' ' -' ■ 

GROUP-REFERENCED: A manner in which achievement data. may he referenced. 
. That is,- student performance is compared with the perfonnance of others 
■ via group statistics such as the mean (average), median, ormo.de. 



INSTRUCTIONAL OBJECTIVES; A statement of instructional. intent which includes • 
j an action verb, conditions^ under which the s'tudent'must perfonTj,.*and a 
- desirable level .of perfonnance. . . ' 

' •• ' r .■■ , ^ \ 

OBSERVATION; A method for .measuring simple types of skills, attitudes, orV 
i. .psychomotor behaviors. • , . " « 



PSYCHOMOTOR DOMAIN: fDne of three aspects of human behavior, Mealing, pri- 
marily with physical (psychomotor) performance. One should recognize 
that cognitive behavior is certainly related to psychomotor^ behavior.^. 
Measures of psychomotor behavior are primarily focused on the physica^^ 
...aspects j\ther than knowledge. , , . ^ 

RATING 'SCAL£: . A device for. developing numerical description^ of skills, 
attitudes, or .psychomotor' behaviors.. These numerical, descriptions- are 
essentially judgmental irv nature. ' • . 



RELIABILITY:. A 1%t characteristic, A test is reliable if it yields scores 
Vhich are precise and consistent over time (i.e., stability, repeat- - 
■ ability, and consistency, of the rSeasurement). ' \. 

SELECTED RESPONSE TEST: A test deliberately constructe(i so the student must 
choose the correct answer from a set of options. 

SKILLS*: Attributes .or oharacteristits inferred or directly observed through 
the consideration of performances or products. Observation, , rating^ 
scales, or checklists are customarily used to me&sure skills.- 

TEST:. A type of flieasureraent where the conditions- for the measurement; are 
uniform for all examinees. • 

VALIDITY: A test characteristic. A. test .is valid if it measures what it is 
purported to measure. " • 



APPENDI^)( C 
,A FEW ITEM -SOURCES 



^Jorthwest Eyaluation Association Item Bank 
. c/o Dr. Freilerick V. Forster 
PO Box 3107/ . . 
Portland, OR 97208 

. .(503) 234-3392 • 

I 1 . if, • . - - . 

/ Education Commission. of the States 
\ \ National Assessment of Educational Progress (NAEP)' 
r 300 Lincoln Tower • , : 

1860 iincoln Street " v- - 

Denver, CO 80203 

Instructional Objectives- Exchange (lOX) 
J - » Box 24095, Department V . . 
Los Angeles, CA- 9002f - 

Minnesota Educational Assessment _ 
Minnesota Department of Education 
Minneapolis, MN 55435 ' . 

Michigan Department of Educatipn" - 
State Department of Education 
, Lansing, MI 48901 • ^ ' ' 



ERIC 



■51. 



56 



MEASURING PERFORMANCE: TEACHER MADE TESTS 



YOUR VIEWS ARE IMPORTANT! After you read and examine this publication, please forward your comments to the 
publications staff of the Oregon department of Education. If you would rather talk by telephone, call us at 378-4776. 
Or, for your convenience, this response form is provided. ' . ^ 

PLEASE RESPOND so that your views, can be constdered as we plan tuiure publications. Simply cut out the form, fold 
and mail it back to us. We want to hear from youl ' . 



Did you read this publication? 

y : Completely ^ ^ , % . 

"1:; — More t>ian half/ 

Less thaff half <, 

Just skimmed ' ^ . 

Does this publication fulfill its purpose as stated in the 
preface or introduction? 

— Completely 
Partly 

■■, ' „: Not at ail 

1 V 

Did^you find this publication useful in your work? 
^ — ' 0^^en_ 

' Sometimes ^ 

£ Seldom ^ 

^ rJever . ^ 

Which^ection is.most valuable?. 



Did you find the content to be stated clearly and 
accurately? • . * 

V 

Always yes , ^ - 

I n general^ yes * 

In general, no 

^ Always no 

Other— J 



What type of work do you do? 
Classroom teacT 

Consultant to clas^oom teacher^ 
School admin is^tratoij 
Other 



^ Were thfe contents presented in a convenient fornriat? 

z Ver^ easy to use 

' Fairly easy . ' 

• Fairly difficult 

— — Very difficult " - 
Other . ^ 

Did you find this publication tabe free of discrimination 
or biased content towards racial, ethnic cultural and 
religious groups, or in terms of sex stereotyping? 

' Yes, without reservations 

' Yes, with reservations 

nthpr 

What is your impression of the overall appearance of the 
^ publication (graphic art, style, type, etc.)? 



^ — Excellent* 

Would you recommend tnis publication to a colleague? Good 

'i' ^ Fair 

Poor 



1 Yes, vyi'thout reserOalions 

— — Yes, wit^i reservatiqns 

: Other , 



Wii^n tfiis publication is revised, what cbangeswould you like to see made?_ 



Additional comments. (Attach a sh(?fet if you wish.) 



1^ Thanks! 



Kotd hero and 5Cal 



' BUSFNESS REPLY MAIL 

rmtT CLASA rCftMIT NO, Ut. SKC »iO. F< U A -tAVCM* OfftOON 



Communications/Government Relations 
Oregon Department of Education 
942 Lancaster Drive'NE, 
Salem; Oregon 97310 ; ^ 



